Emergency Medicine Journal — Latest Matching Preprints

1

Moving diagnostics upstream: prehospital blood gas analysis is associated with safe community care and improved patient selection for hospital admission

Lux, H.; Roth, J.; Hemmer, S.; Lang, S.; Lewejohann, J.-C.; Bauer, M.; Brock, J.; Dickmann, P.

2026-04-03 emergency medicine 10.64898/2026.04.01.26349943 medRxiv

Top 0.1%

62.0%

Show abstract

Background Emergency departments (EDs) in high-income countries face rising demand, workforce shortages and crowding. We investigated whether prehospital point-of-care blood gas analysis (BGA), used by emergency physicians, is associated with higher ambulatory treatment rates and improved patient selection for hospital admission. Methods We retrospectively analysed routinely collected data from a pilot implementation of a mobile blood gas analyser in physician-staffed emergency medical services (EMS) in Jena, Germany (July 2023 to May 2024). Adult emergency patients receiving prehospital BGA were compared with propensity score-matched EMS controls without BGA. Primary outcomes were the proportion treated on scene and, among transported patients, the hospital admission rate. Secondary outcomes were 30-day safety among ambulatory patients and associations between BGA parameters and disposition. We used standardised mean differences to assess balance and receiver operating characteristic analysis for lactate thresholds. Results Of 109 patients receiving prehospital BGA, 98 met inclusion criteria after excluding 9 patients with missing NACA scores, 1 on-scene death and 1 invalid age record; these were matched to 390 controls (total n = 488). Baseline demographics, severity and vital signs were well balanced. Ambulatory treatment was markedly higher in the BGA cohort compared with matched controls (27.6% vs 8.7%; OR 3.98, 95% CI 2.26 to 7.01; p<0.001). No ambulatory BGA patient required ED re-attendance or repeat EMS contact within 30 days. Among transported patients, 58% in the BGA cohort were admitted to hospital, compared with an overall regional ED conversion rate of approximately 30%. Lactate [≥]2.6 mmol/L was the most influential parameter for disposition decisions, with elevated lactate and acid-base disturbances strongly associated with transport and admission. Conclusion Prehospital BGA was associated with fourfold higher ambulatory treatment rates (27.6%) and a twofold higher ED conversion rate among the patients who were transported (58%), indicating improved risk stratification and resource allocation. These findings suggest that integrating objective biochemical data into prehospital assessment may enhance treat-and-refer decision-making and support more efficient use of limited emergency care capacity.

2

Acute Hyperkalemia and 30-Day Mortality: Increased Mortality at Slightly Elevated Plasma Potassium Levels

Egeberg, F.; Nygaard, H.; Grand, J.; Itenov, T. S.; Lindquist, M.; Folke, F.; Christensen, H. C.; Lundager-Forberg, J.; Sajadieh, A.; Petersen, J.; Haugaard, S. B.; Mottlau, R. G.

2026-04-11 emergency medicine 10.64898/2026.04.10.26350589 medRxiv

Top 0.1%

41.0%

Show abstract

Background: Potassium is involved in multiple physiological processes in the body, and hyperkalemia is a common, potentially life-threatening condition. Objective: The aim of our study was to examine the association between plasma potassium levels, and 30-day mortality in patients presenting to an emergency department with normo- or hyperkalemia. Design: Retrospective Cohort study. Setting: Emergency Departments in the Capital region of Denmark Participants: Persons attending Emergency Departments in the Capital Region of Denmark from 2017--2021 with a plasma potassium level of at least 3.5 mM measured within 4 hours after arrival. Measurements: The study was based on data from Danish National Registries and electronic patient records. We performed Kaplan-Meier survival analyses and unadjusted and adjusted cox regression analyses utilizing plasma [K+] 3.5--4.4 mM as the reference group for 30-day mortality hazard ratios (HRs). Results: A total of 248,453 patients were included with a median age of 60 years (Q1;Q3 42;75), and 6,959 (2.8%) died within 30 days. Mortality was 2.2% for potassium level 3.5--4.4 mM, 6.9% for 4.5--4.9 mM, 17.1% for 5.0--5.9 mM, and 26.9% for [≥] 6.0 mM. Unadjusted 30-day HRs were 3.2 (95%CI: 3.0--3.4) for [K+] 4.5--4.9 mM, 8.6 (95%CI: 7.9--9.3) for [K+] 5.0--5.9 mM, and 14.7 (95%CI: 12.5--17.0) for [K+] [≥]6.0 mM. Adjusted HRs were 1.4 (1.3--1.5), 2.10 (1.9--2.3), and 2.4 (2.0--2.8), respectively. Limitations: Risk of residual confounding. Missing data. No access to data regarding in-hospital treatment. Conclusion: Plasma potassium levels above 4.4 mM were associated with increased 30-day mortality among patients presenting to emergency departments. Primary funding source: Department of Emergency Medicine, Copenhagen University hospital, Bispebjerg and Frederiksberg Hospital.

3

Easily Scalable, Rapidly Deployable Mechanical Ventilator For Pandemic Health Crises In Resource-Limited Areas

Farre, R.; Salama, R.; Rodriguez-Lazaro, M. A.; Kiarostami, K.; Fernandez-Barat, L.; Oliveira, V. D. C.; Torres, A.; Farre, N.; Dinh-Xuan, A. T.; Gozal, D.; Otero, J.

2026-04-11 emergency medicine 10.64898/2026.04.08.26350386 medRxiv

Top 0.1%

15.5%

Show abstract

BackgroundThe COVID-19 pandemic exposed critical shortages of mechanical ventilators, particularly in low-resource settings. Disruptions in global supply chains and dependence on specialized components highlighted the need for scalable, locally manufacturing alternatives for emergency respiratory support. AimTo describe and evaluate a simplified, supply-chain-independent mechanical ventilator assembled from widely available automotive and simple hardware components, and intended as a last-resort solution. MethodsThe ventilator is based on a reciprocating air pump driven by an automotive windshield wiper motor coupled to parallel shaft bellows and readily assembled passive membrane valves, only requiring materials available from standard hardware retailers, minimal tools, and basic manual skills. Ventilator performance was assessed through bench testing using a patient model simulating severe lung disease in an adult (R=20 cmH2O{middle dot}s/L, C=15 mL/cmH2O) and pediatric (R=50 cmH2O{middle dot}s/L, C=10 mL/cmH2O) patients. Realistic proof of concept was performed in four mechanically ventilated 50-kg pigs. ResultsThe device delivered tidal volumes up to 600 mL and respiratory rates up to 45 breaths/min with PEEP up to 10 cmH2O, covering pediatric and adult ventilation ranges. In vivo testing showed that the ventilator maintained arterial blood gases within the targeted range. Technical details for ventilator construction are provided in an open-source video tutorial. DiscussionThis low-cost ventilator demonstrated adequate performance under demanding conditions. Although not a substitute for commercial intensive care ventilators, its simplicity, autonomy, and independence from fragile supply chains provide a potentially life-saving option in resource-constrained emergency scenarios.

4

Design and preliminary safety validation of a hybrid deterministic-AI triage system for multilingual primary healthcare: a WhatsApp-based vignette study in South Africa

Nkosi-Mjadu, B. E.

2026-04-22 health informatics 10.64898/2026.04.21.26349781 medRxiv

Top 0.1%

7.0%

Show abstract

BackgroundSouth Africas public healthcare system serves most of the population through approximately 3,900 primary healthcare clinics characterised by long waiting times and high volumes of repeat-prescription visits. No published pre-arrival digital triage system operates across all 11 official South African languages while aligning with the South African Triage Scale (SATS). This paper reports the design and preliminary safety validation of BIZUSIZO, a hybrid deterministic-AI WhatsApp triage system. MethodsBIZUSIZO delivers SATS-aligned triage via WhatsApp, combining AI-assisted free-text classification (Claude Haiku 4.5) with a Deterministic Clinical Safety Layer (DCSL) that overrides AI output for 53 clinical discriminator categories (14 RED, 19 ORANGE, 20 YELLOW) coded in all 11 official languages and independent of AI availability. A five-domain risk factor assessment can only upgrade triage level. One hundred and twenty clinical vignettes in patient language (English, isiZulu, isiXhosa, Afrikaans; 30 per language) were scored against a developer-assigned gold standard with independent blinded nurse review. A 121-vignette multilingual DCSL safety consistency check across all 11 languages and a 220-call post-hoc framing sensitivity evaluation (110 paired vignettes) were also conducted. ResultsUnder-triage was 3.3% (4/120; 95% CI: 0.9%-8.3%) with no RED under-triage; exact concordance was 80.0% (96/120) and quadratic weighted kappa 0.891 (95% CI: 0.827-0.932). One two-level under-triage was observed on a non-RED presentation (V072, isiXhosa burns vignette, ORANGEGREEN); one two-level over-triage was observed (V054, isiZulu deep laceration, YELLOWRED). In the framing sensitivity evaluation, AI-only classification achieved 50.9% RED invariance under adversarial framing; full-pipeline classification achieved 95.0% in four validated languages, with the DCSL rescuing 18 of 23 AI drift cases. ConclusionsA hybrid deterministic-AI triage system with DCSL-based emergency detection achieved zero RED under-triage and consistent RED detection across all 11 official languages. The 16.7% over-triage rate falls within published South African SATS ranges (13.1-49%). A single two-level under-triage event was observed on an isiXhosa burns vignette (ORANGEGREEN) and is discussed in Limitations. Findings are preliminary; prospective validation against independent nurse triage is the necessary next step.

5

Improving Care by FAster risk-STratification through use of high sensitivity point-of-care troponin in patients presenting with possible acute coronary syndrome in the EmeRgency department (ICare-FASTER): a stepped-wedge cluster randomized trial

Than, M.; Pickering, J. W.; Joyce, L. R.; Buchan, V. A.; Florkowski, C. M.; Mills, N. L.; Hamill, L.; Prystowsky, J.; Harger, S.; Reed, M.; Bayless, J.; Feberwee, A.; Attenburrow, T.; Norman, T.; Welfare, O.; Heiden, T.; Kavsak, P.; Jaffe, A. S.; apple, f.; Peacock, W. F.; Cullen, L.; Aldous, S.; Richards, A. M.; Lacey, C.; Troughton, R.; Frampton, C.; Body, R.; Mueller, C.; Lord, S. J.; George, P. M.; Devlin, G.

2026-04-23 cardiovascular medicine 10.64898/2026.04.21.26351433 medRxiv

Top 0.2%

3.7%

Show abstract

BACKGROUND Point-of-care (POC) high-sensitivity cardiac troponin (hs-cTn) testing has the potential to expedite decision-making and reduce emergency department (ED) length of stay for patients presenting with possible myocardial infarction (MI) by ensuring that results are consistently available when looked for by clinicians. We assessed the real-life effectiveness and safety of implementing POC hs-cTn testing in the ED. METHODS We conducted a pragmatic, stepped-wedge cluster randomized trial. The control arm was usual care with an accelerated diagnostic pathway utilizing a single-sample rule-out step with a central laboratory hs-cTn assay. The intervention arm used the same pathway with a POC hs-cTnI. The primary effectiveness outcome was ED length of stay assessed using a generalized linear mixed model, and the safety outcome was 30-day MI or cardiac death. RESULTS Six sites participated with 59,980 ED presentations (44,747 individuals, 61{+/-}19 years, 49.5% female) from February 2023 to January 2025, in which 31,392 presentations were during the intervention arm. After adjustment for co-variates associated with length of stay, the intervention reduced length of stay by 13% (95% confidence intervals [CI], 9 to 16%. P<0.001), corresponding to a reduction of 47 minutes (95%CI, 33 to 61 minutes) from a mean length of stay in the control arm of 376 minutes. The 30-day MI or cardiac death rate was similar in the control and intervention arms (0.39% and 0.39% respectively, P=0.54). CONCLUSIONS Implementation of whole-blood hs-cTnI testing at the POC into an accelerated diagnostic pathway was safe and reduced length of stay in the ED compared with laboratory testing.

6

Trade-offs in emergency transport protocols for access to hip fracture management: a geospatial analysis of selective versus standard transfer in Ontario long-term care

Yee, N. J.; Chen, T.; Huang, Y. Q.; Whyne, C.; Halai, M.

2026-04-14 orthopedics 10.64898/2026.04.12.26350713 medRxiv

Top 0.2%

3.7%

Show abstract

Objectives: For suspected hip fractures, prehospital protocols directing patients to an orthopaedic centre rather than the nearest emergency department (ED) could reduce time-to-surgery but may impact EMS travel burden. This study evaluates the impact of transfer protocols by quantifying transport to hospitals from long term care (LTC) facilities across Ontario. Methods: A retrospective cross-sectional analysis of all Ontario LTC facilities and hospitals was performed. Two protocols were modeled: standard transfer to the nearest ED with subsequent transfer if required, and selective transfer based on Collingwood Hip Fracture Rule prehospital screening1 directly to the nearest orthopaedic services (orthoED). Median one-way travel distances were calculated from Google Maps. Results: In Ontario, 15.4% of LTC residents require hospital destination decisions because their nearest ED lacks orthopaedic services; for these facilities, median distances were 2.7km to the ED and 36.0km to the orthoED. Among the 52 LTC facilities where selective transfer was distance-optimal, it substantially reduced travel for patients with hip fracture (31.1km vs 49.6km; P<.01) while only modestly increasing travel for patients without hip fracture. Where standard transfer was distance-optimal, little travel difference was noted for patients with hip fracture, however false positive screened patients traveled significantly further to an orthoED. Greatest negative consequences of selective transfer lie in the 1.3% of residents living farthest (>100km) from an orthoED. Conclusions: EMS direct transportation to hospitals with orthopaedics may improve hip fracture care but can increase EMS burden due to patients identified falsely as having a hip fracture, particularly in remote communities.

7

Most Instability Phases Resolve: Empirical Evidence for Trajectory Plasticity in Multimorbidity Care from Longitudinal Relational Monitoring

Martin, C. M.; henderson, i.; Campbell, D.; Stockman, K.

2026-04-24 health informatics 10.64898/2026.04.22.26351537 medRxiv

Top 0.2%

3.2%

Show abstract

Background: The instability-plasticity framework proposes that multimorbidity trajectories periodically enter instability phases that are vulnerable to escalation but also potentially modifiable through relational intervention. Whether such phases commonly resolve without acute care, or predominantly progress to hospitalisation, has not been quantified at scale. Objective: To quantify instability window outcomes across a longitudinal monitoring cohort; to test whether the characteristics distinguishing admitted from resolved windows reflect within-patient trajectory dynamics or between-patient severity; and to characterise which patient-reported and operator-rated signals reliably precede admission, using both a curated pilot sub-cohort and the full monitoring cohort with an explicit cross-cohort comparison. Methods: Two complementary analyses were conducted on data from the MonashWatch Patient Journey Record (PaJR) relational telehealth system. Instability windows were identified algorithmically (>=2 consecutive calls with Total_Alerts >=3) across the full longitudinal dataset (16,383 calls, 244 patients, 2.5 years) and classified by linkage to ED and hospital admission data. Window characteristics were compared at window, patient, and paired within-patient levels. Pre-admission signal cascades were analysed in two configurations: a curated pilot sub-cohort (64 patients, 280 calls, +/-10-day window, 103 admissions, December 2016-September 2017) and the full monitoring cohort (175 patients, 1,180 pre-admission calls, +/-14-day window, December 2016-July 2019). A three-way cross-cohort comparison decomposed differences between the two configurations into pipeline and population effects. Results: 621 instability windows were identified across 157 patients (64% of the monitored cohort). 67.3% resolved without hospital admission or ED attendance, a rate stable across alert thresholds 1-5. In paired within-patient analysis (n = 70), duration in days (p = 0.002) and multi-domain breadth (p < 0.001) distinguished admitted from resolved windows; alert intensity did not. In the pilot sub-cohort, patient-reported illness prognosis (Q21) was the dominant pre-admission signal (GEE beta = +0.058, AUC = 0.647, p-BH = 0.018). This finding did not replicate in the full cohort: Q21 was non-significant (GEE beta = -0.008, p = 0.154, AUC = 0.507). Cross-cohort analysis identified selective curation of the pilot sub-cohort as the primary explanation. In the full cohort, six signals escalated significantly before admission after Benjamini-Hochberg correction: total alerts, health impairment (Q26), red alerts, self-rated health (Q3), patient concerns (Q1), and operator concern (Q34). Health impairment achieved the highest individual AUC (0.605) and showed the longest pre-admission lead. No individual signal exceeded AUC 0.61. Conclusions: Two thirds of instability phases resolve without hospitalisation, providing direct empirical support for trajectory plasticity as a clinically frequent phenomenon. Within the same patient, persistence - in duration and in the consistency of high-severity multi-domain flagging across calls - distinguishes trajectories that tip into admission from those that resolve. The Q21 signal reversal between cohorts illustrates how selective curation can produce compelling but non-replicable findings in monitoring research. In the full population, objective alert signals and operator judgement, rather than patient illness prognosis, carry the pre-admission signal

8

Acceptability of an intervention to improve uptake of evidence-based emergency myocardial infarction care in Tanzania: A qualitative study

Sumner, S. F.; Sakita, F. M.; Haukila, K. F.; Wanda, L.; Kweka, G. L.; Mlangi, J. J.; Shayo, P.; Tarimo, T. G.; Khanna, S.; Wang, C.; Pyne, A.; Manavalan, P.; Thielman, N. M.; Bettger, J. P.; Hertz, J. T.

2026-04-11 health systems and quality improvement 10.64898/2026.04.07.26348549 medRxiv

Top 0.2%

2.1%

Show abstract

Acute myocardial infarction (AMI) is an increasing cause of morbidity and mortality in Sub-Saharan Africa (SSA) but is often underdiagnosed and undertreated. To address this gap, the Multicomponent Intervention to Improve Myocardial Infarction Care (MIMIC) was developed and implemented in the emergency department (ED) of a regional referral center in northern Tanzania. We conducted in-depth interviews with 20 key stakeholders (physicians, nurses, administrators, and patients) who participated in MIMIC during the first year of implementation. Purposive sampling was used to recruit a broad range of participants. Interviews were guided by a semi-structured interview guide informed by the Theoretical Framework of Acceptability (TFA). Interview transcripts were thematically analyzed by a team of coders using an inductive, grounded theory approach guided by the seven TFA domains. Nineteen major themes emerged across all TFA domains. Overall, participants described MIMIC as highly acceptable, minimally burdensome, and well-aligned with professional and ethical values. Perceived effectiveness was most emphasized, with staff citing improvements in AMI recognition, ECG and troponin testing, and use of evidence-based therapies. All components were highlighted as effective and easily integrated into existing workflows. Patients valued the educational pamphlet for improving knowledge and self-efficacy, though staff expressed concerns about distributing it during acute care, contributing to inconsistent delivery. Champions were viewed as key in promoting adherence and sustaining implementation of the intervention. MIMIC was widely acceptable in all seven TFA domains among ED providers and patients, with perceived effectiveness driving positive attitudes across stakeholder groups. Use of a co-design approach in MIMIC development likely contributed to high intervention acceptability. Patient education strategies may require adaptation to improve fidelity. These findings suggest that continued implementation and future adaptation of MIMIC may be feasible.

9

Moving Beyond Duty Hours: Understanding the Contributors to Internal Medicine Resident Workload and Experience

Bianchina, N.; Fischer, C.; Rai, K.; Clawson, J.; McBeth, L.; Gottenborg, E.; Keniston, A.; Burden, M.

2026-04-11 medical education 10.64898/2026.04.08.26349405 medRxiv

Top 0.2%

1.9%

Show abstract

BackgroundHigh workload among healthcare workers has increasingly been correlated with poor patient outcomes, inefficient operational and financial outcomes, and burnout. Despite growing literature exploring causes of attending physician workload, there is limited understanding of trainee-specific measures. ObjectiveWe aimed to characterize elements contributing to trainee workload and perceived challenges and satisfiers to the trainee workday as a foundation for better understanding and measuring trainee work experience. MethodsInternal Medicine and Medicine-Pediatrics residents at an academic medical center were invited to participate in focus groups discussing contributors to inpatient workload and work experience between March and April 2024. A qualitative content analysis identified key metrics of trainee workload and work experience, which were then consolidated into overarching domains. A structured, multi-round rating process ranked the perceived relevance of each metric. ResultsTwenty residents participated across six focus groups. Analysis of focus groups yielded 297 workload metrics across 28 unique domains. Seventeen domains had metrics identified as highly relevant (median 6-7; IQR < 1) including autonomy, communication, disruptions, task switching, documentation, emotional burden, patient factors, professional fulfillment, rounding, teaming, and work-life balance. ConclusionsResident physicians highlighted complex interactions between clinical factors, work design, and psychosocial dynamics that contribute to their sense of workload. This creates opportunities to develop unique measures of workload to understand the trainee experience better. Further studies are needed to capture the generalizability of these findings and the relationship between these workload domains and patient, organizational, and trainee outcomes with the aim of implementing evidence-based work design.

10

Routine Data for Workforce Equality Monitoring: Ethnic Inequalities in Recruitment and Workforce Representation in Nursing and Midwifery

Boldbaatar, A.; Strahle, S.; Shamsuddin, A.; Henderson, D.

2026-04-03 nursing 10.64898/2026.03.31.26349776 medRxiv

Top 0.3%

1.7%

Show abstract

Aim To examine ethnic inequalities in recruitment outcomes and workforce representation across pay bands among nursing and midwifery staff, and to assess whether routinely collected administrative data can generate reproducible indicators for workforce equality monitoring. Design Retrospective observational study. Methods We analyzed routinely collected administrative data from one NHS Board in Scotland. This included annual staff-in-post data for 2021/22 to 2024/25 and pooled recruitment data on interviewed candidates and conditional job offers for 2021/22 to 2023/24. Ethnicity was grouped as White and non-White. Analyses focused on Bands 5, 6 and 7. Recruitment outcomes were assessed using relative risks for receipt of a conditional job offer among interviewed candidates, comparing White and non-White applicants. Workforce representation across pay bands was assessed using representation quotients. Analyses were descriptive and unadjusted. Results White applicants were more likely than non-White applicants to receive a conditional job offer following interview across all pay bands examined. Inequalities were also evident at Band 5, the usual entry point to registered practice. Workforce composition analyses showed a corresponding gradient in representation, with non-White staff overrepresented in Band 5 and underrepresented in Bands 6 and 7, with little change over the study period. Conclusion Routinely collected administrative data can generate reproducible indicators of ethnic inequality in recruitment and workforce representation. Embedded within existing workforce systems, such analyses could strengthen workforce equality monitoring, support benchmarking and enhance accountability across healthcare settings. Impact Utilising routine administrative data for workforce equality monitoring can support policy and practice aimed at improving accountability, retention and workforce sustainability across health systems. Reporting Method This study followed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) reporting guidelines. Patient or Public Involvement This study did not include patient or public involvement in its design, conduct, or reporting.

11

ECG spectrogram-based deep learning model to predict deterioration of patients with early sepsis at the emergency department: a study from the Acutelines data- and biobank

van Wijk, R. J.; Schoonhoven, A. D.; de Vree, L.; Ter Horst, S.; Gaidhane, C.; Alcaraz, J. M. L.; Strodthoff, N.; ter Maaten, J. C.; Bouma, H. R.; Li, J.

2026-03-27 emergency medicine 10.64898/2026.03.26.26349371 medRxiv

Top 0.3%

1.7%

Show abstract

Purpose: Early recognition of deterioration in patients with suspected infection at the emergency department (ED) is important. Current clinical scoring systems show limited discriminative performance for early deterioration. Continuous electrocardiogram (ECG) recordings may offer additional dynamic physiological information that can enhance early prediction of deterioration in patients with suspected infection. Methods: We developed a multimodal, ECG-derived spectrogram-based pipeline to predict deterioration within 48 hours of ED admission. We used the first 20 minutes of ECG recordings for the spectrograms. We compared the model with the National Early Warning Score (NEWS), quick Sequential Organ Failure Assessment (qSOFA), a baseline model with vital parameters, sex, and age, and a Heart Rate Variability (HRV) derived model. Results: In this study, 1321 patients were included, of whom 159 (12%) deteriorated. The multimodal model combining baseline data with spectrograms showed the best overall performance, with an Area Under the Receiver Operating Characteristic (AUROC) of 0.788, followed by the baseline model (age, sex, triage vitals) alone, with an AUROC of 0.730. The HRV-only model and the qSOFA showed the lowest performance (AUROC 0.585 and 0.693, respectively). Conclusion: This study shows that ECG-derived multimodal spectrogram models outperform those based solely on vital signs and HRV features, as well as established clinical scores such as NEWS and qSOFA. Spectrogram analysis represents a promising approach to enhance early risk stratification and support clinical decision-making for patients with suspicion of infection in the ED.

12

Preventive care in orthopaedic clinical services - testing the acceptability of an online health risk self-assessment tool using a multi-method design

Davidson, S. R.; Browne, S.; Giles, L.; Gillham, K.; Haskins, R.; Campbell, E.

2026-04-10 public and global health 10.64898/2026.04.09.26350435 medRxiv

Top 0.3%

1.5%

Show abstract

Abstract Background Musculoskeletal conditions, such as back pain and osteoarthritis, are common and disabling disorders. Musculoskeletal conditions are closely related to chronic disease risk factors like smoking/vaping, poor nutrition, alcohol misuse and physical inactivity and impact a person's risk of falling (SNAPF). Preventive care for SNAPF risks is often overlooked. Online delivery of preventive care may increase the provision of this care. We aimed to assess if an online tool for SNAPF risks would be used by and acceptable to patients waiting for an orthopaedic consultation. Methods We completed a multi-method study to test an online health risk self-assessment tool. A random sample of 300 people on the orthopaedic outpatient waiting list aged 18-64 years were sent the tool in batches of 20-50. The tool assessed SNAPF risks and provided feedback against national guidelines. After each batch, we completed feedback interviews with participants to assess acceptability and updated the tool. We summarised quantitative data using descriptive statistics and qualitative data using thematic analysis. Results Of the 300 participants sent the tool, 51.3% were female, 8.6% identified as Aboriginal and/or Torres Strait Islander, with a mean (SD) age of 52.0 years (11.2). There were 170 participants (59.2%) who completed the tool, 117 who did not complete it, and 13 participants who were excluded from analysis because they did not receive the SMS. We conducted 184 feedback interviews, including 125 'completers' and 59 'non-completers'. The percentage of participants who felt that SMS was an appropriate way to receive the tool was 84.7% of 'completers' and 50% of 'non-completers'. The two most common reasons for not completing the tool were due to perceived risk (13/59, 22.0%), and the SMS was received at an inconvenient time (11/59, 18.6%). Qualitative data from the feedback interviews captured three enablers: i) design, ii) high importance, and iii) engagement with health service, along with four barriers: i) design, ii) risk, iii) relevance, and iv) engagement with health service. Conclusion Our study found that an online health risk self-assessment tool appears to be an acceptable way to assess chronic disease and falls risk factors for people on an orthopaedic waitlist.

13

Effect of NHS surgical hubs on elective primary hip-and-knee replacement volume, length of stay and waiting times: national longitudinal difference-in-differences study

Wen, J.; Anteneh, Z.; Castelli, A.; Street, A.; Gutacker, N.; Scantlebury, A.; Glerum-Brooks, K.; Davies, S.; Bloor, K.; Rangan, A.; Castro Avila, A.; Lampard, P.; Adamson, J.; Sivey, P.

2026-04-22 health policy 10.64898/2026.04.21.26351383 medRxiv

Top 0.3%

1.4%

Show abstract

ObjectivesTo evaluate the effect of surgical hubs on the volume of surgeries, patient waiting times, and length of hospital stay for elective hip and knee replacements in the English NHS. DesignA retrospective longitudinal study using a difference-in-differences approach to compare changes in outcomes at NHS trusts that opened surgical hubs with those that did not. SettingThe study was set in the English NHS, using administrative data from NHS acute trusts providing elective hip and knee replacements between April 2014 and September 2024. ParticipantsThe study included 76 NHS trusts. The treatment group consisted of 29 trusts that opened a surgical hub for trauma and orthopaedic surgery during the study period. The control group consisted of 47 trusts that did not. 48 trusts that performed fewer than 1,000 relevant procedures over the ten-year period or that reported data for fewer than 41 of the 42 quarters in the sample period were excluded. InterventionThe phased introduction of surgical hubs dedicated to elective procedures at 29 NHS trusts between Q1 2020 and Q3 2024. Main outcome measuresThe three main outcomes were, measured at the trust-quarter level: the total number of elective primary hip and knee replacements (surgical volume), the average length of stay in hospital, and the average waiting time from being added to the waiting list to hospital admission. ResultsThe opening of a surgical hub was associated with an increase of 43.75 hip and knee replacement surgeries per quarter (95% CI: 22.22 to 65.28), which represents a 19.1% increase compared to the pre-hub mean. Length of stay was reduced by 0.32 days (95% CI: - 0.48 to -0.16), a 7.8% reduction. There was no statistically significant effect on average waiting times (-14.96 days, 95% CI: -33.11 to 3.19). ConclusionsSurgical hubs appear to be effective at increasing the number of hip and knee replacements and reducing the time patients spend in hospital. However, in this study, they did not lead to a statistically significant reduction in waiting times overall.

14

Classification of Recurrence Status After Surgical Treatment of Chronic Subdural Hemorrhage - A Machine Learning Approach

Hamou, H.; Kernbach, J.; Ridwan, H.; Fay-Rodrian, K.; Clusmann, H.; Hoellig, A.; Veldeman, M.

2026-03-27 neurology 10.64898/2026.03.25.26349323 medRxiv

Top 0.3%

1.4%

Show abstract

Background Chronic subdural hematoma (cSDH) recurrence requiring reoperation occurs in 5-33% of cases, representing a substantial clinical and economic burden. The ability to predict recurrence could enable risk-stratified surveillance protocols, potentially reducing imaging burden in low-risk patients while maintaining close monitoring for high-risk individuals. We evaluated whether machine learning algorithms could achieve clinically actionable recurrence prediction using routinely available clinical and radiographic variables. Methods This retrospective single-center study included 564 consecutive patients who underwent surgical evacuation of cSDH between 2015 and 2023. Data were randomly divided into training (75%, n=422) and test (25%, n=142) sets. We developed and compared three machine learning models--regularized logistic regression, Random Forest, and XGBoost--using 31 predictor variables including demographics, comorbidities, medications, laboratory values, hematoma characteristics, and postoperative features. Model development and hyperparameter tuning were performed exclusively on the training set using 10-fold cross-validation. The best-performing model was selected and evaluated on the held-out test set. The primary outcome was postoperative recurrence requiring reoperation. Results Postoperative recurrence occurred in 170 patients (30.1%). Within the training set, XGBoost achieved the highest cross-validated ROC AUC of 0.713 (SE=0.024), outperforming regularized logistic regression (0.686) and matching Random Forest (0.713). Variable importance analysis identified hematoma volume, coagulation parameters (INR, platelets, aPTT), and disease severity markers (ICU admission, GCS) as the most influential predictors, though absolute effect sizes remained modest. On the held-out test set, the final XGBoost model achieved ROC AUC 0.688 (95% CI: 0.590-0.772) with excellent calibration. However, at the clinically relevant 90% sensitivity threshold, test set specificity was only 30.3%, allowing potential imaging reduction in approximately one-third of non-recurrence patients. The consistency between training and test performance confirmed that limitations stem from inherent predictor information content rather than overfitting. Conclusions Machine learning models using routinely available clinical and radiographic variables cannot achieve clinically actionable risk stratification for cSDH recurrence. Despite rigorous methodology and internal validation, discriminative capacity remained insufficient to identify a low-risk patient subgroup suitable for de-escalated surveillance. These findings suggest that recurrence is driven by factors not captured in standard clinical assessment, and support either uniform surveillance protocols or symptom-driven imaging strategies rather than risk-stratified approaches.

15

A Retrospective Propensity Score Matched Cohort Study Comparing Intact Fish Skin Graft with Synthetic and Biosynthetic Dermal Substitutes for Acute Burn Injuries Requiring Dermal Substitution and Autografting: Outcomes from the American Burn Association Registry

Sood, R.; Hevelone, N. D.; Davidsson, O. B.; Kristjansson, R. P.; Phillips, B. D.; Lantis, J. C.; Johannsson, G.

2026-04-16 intensive care and critical care medicine 10.64898/2026.04.14.26350896 medRxiv

Top 0.4%

1.3%

Show abstract

Abstract Objective: The objective of this study was to compare hospital length of stay and other clinical outcomes between intact fish skin graft (IFSG; Graftguide, Kerecis, Arlington, VA) and synthetic/biosynthetic dermal substitutes (SSS; Integra Dermal Regeneration Template and NovoSorb Biodegradable Temporizing Matrix) in propensity score matched burn patients using the American Burn Association Burn Care Quality Platform. Methods: This retrospective cohort study identified adult patients treated with a single dermal substitute product during hospitalization for acute burn injury. Patients receiving IFSG (n = 93) were matched 1:4 to patients receiving SSS (n = 372) using nearest neighbor propensity score matching on the logit scale. Matching covariates included total body surface area burned (TBSA), patient age, sex), burn severity classification, inhalation injury, and trauma diagnosis. The primary outcome was hospital length of stay (LOS), analyzed using a gamma generalized linear mixed model (GLMM). Secondary outcomes included the incidences of sepsis, graft loss, venous thromboembolism (VTE), and hospital acquired pressure injury (HAPI). A prespecified sensitivity analysis was performed using a broader mixed product cohort. Results: A total of 93 IFSG treated patients from 17 burn centers admitted between the years 2019 and 2025 were matched 1:4 to 372 SSS treated patients from 44 centers. Unadjusted mean LOS was 24.1 days (median 20, IQR 11 to 32) in the IFSG treated group and 36.7 days (median 31, IQR 17 to 52) in the SSS treated group representing a 12.6 day reduction. GLMM-adjusted estimated marginal mean LOS was 24.2 days (95% CI, 20.0 to 29.4) for IFSG versus 33.5 days (95% CI, 30.0 to 37.6) for SSS (ratio 0.723; p = 0.00245), representing a 9.3 day reduction. Sepsis (1.1% vs 4.6%), graft loss (3.2% vs 8.3%), VTE (2.2% vs 2.7%), and HAPI (2.2% vs 3.8%) were all numerically lower in the IFSG treated arm; although GLMM-adjusted odds ratios were not statistically significant for any individual complication. The mixed cohort sensitivity analysis (n = 229 IFSG vs 458 SSS across 67 centers) confirmed the primary finding with GLMM adjusted LOS ratio 0.716 (p = 0.0001). Conclusions: In this propensity score matched analysis of the ABA registry, IFSG was associated with a statistically significant and clinically meaningful reduction in hospital length of stay compared with synthetic/biosynthetic dermal substitutes, in requiring dermal substitution and autografting, with all complication rates, sepsis, graft loss, VTE, and HAPI, numerically lower in the IFSG-treated arm. The shorter hospitalization was not achieved at the expense of safety. These findings support IFSG as a viable alternative to synthetic dermal substitutes in burns requiring dermal substitution and autografting. Prospective studies are warranted particularly in larger burns requiring staged reconstruction.

16

Availability and Quality of Anthropometric Data in Swiss Childrens Hospitals: The SwissPedGrowth Project

Leuenberger, L. M.; Shoman, Y.; Romero, F.; Deligianni, X.; Hartung, A.; Mozun, R.; Goebel, N.; Bielicki, J. A.; Burckhardt, M.-A.; Latzin, P.; Saner, C.; Posfay-Barbe, K. M.; Schwitzgebel, V.; Giannoni, E.; Hauschild, M.; Stocker, M.; Righini-Grunder, F.; Lauener, R.; Mueller, P.; Schlapbach, L. J.; Jenni, O. G.; Spycher, B. D.; Kuehni, C. E.; Belle, F. N.; for the SwissPedHealth Consortium,

2026-03-30 health informatics 10.64898/2026.03.27.26349493 medRxiv

Top 0.4%

1.0%

Show abstract

OBJECTIVE: Anthropometric data are critical in paediatric care, routinely assessed during clinical visits, and available in electronic health records (EHRs). We describe the feasibility of extracting anthropometric data from heterogeneous EHR systems of Swiss childrens hospitals, evaluate their availability and quality, and assess the cohorts representativeness of the general population. METHODS: In this multicentre study (SwissPedGrowth), we retrospectively collected EHRs from patients <20 years who visited hospitals in Basel, Bern, Geneva, Lausanne, Luzern, St. Gallen, or Zurich between 2017-2023. Sociodemographic, administrative, and clinical information from EHRs were provided in a standardized way by a paediatric national data stream (SwissPedHealth), including the Swiss Neighbourhood Index of Socioeconomic Position (Swiss-SEP). We counted anthropometric recordings per visit to describe availability and used a self-developed and an existing (growthcleanr) algorithm to investigate data quality. To assess representativeness, we compared sociodemographic characteristics between SwissPedGrowth and the general paediatric population in Switzerland, computed standardized differences (effect size: 0.2 small, 0.5 medium, 0.8 large), and weighted the study population to reduce differences. RESULTS: We included 477,531 patients and 2,171,633 hospital visits; 54% boys, 71% Swiss, mean Swiss-SEP 65 (SD: 11), and median age at visit 6.3 [IQR: 2.3, 11.8] years. Height recordings were available for 20% of the visits, weights for 43%, and head circumferences for 5%, with better availability for inpatient stays than outpatient or emergency visits. Combining the self-developed and existing algorithm, 4% of heights and 3% of weights were flagged as outliers and 29% of heights and 31% of weights as carried forward from previous visits or same day duplicates. Sociodemographic differences between SwissPedGrowth and the general population were small or small-to-medium and disappeared after weighting. CONCLUSION: SwissPedGrowth demonstrates feasibility of extracting high-quality anthropometric data for paediatric growth research, but challenges regarding completeness and harmonization of EHR data across Swiss hospitals remain.

17

Comparing prognostic performance and reasoning between large language models and physicians

Gjertsen, M.; Yoon, W.; Afshar, M.; Temte, B.; Leding, B.; Halliday, S.; Bradley, K.; Kim, J.; Mitchell, J.; Sanders, A. K.; Croxford, E. L.; Caskey, J.; Churpek, M. M.; Mayampurath, A.; Gao, Y.; Miller, T.; Kruser, J. M.

2026-04-25 intensive care and critical care medicine 10.64898/2026.04.17.26350898 medRxiv

Top 0.4%

0.9%

Show abstract

Importance: Physicians routinely prognosticate to guide care delivery and shared decision making, particularly when caring for patients with critical illnesses. Yet, these physician estimates are prone to inaccuracy and uncertainty. Artificial intelligence, including large language models (LLMs), show promise in supporting or improving this prognostication. However, the performance of contemporary LLMs in prognosticating for the heterogeneous population of critically ill patients remains poorly understood. Objective: To characterize and compare the performance of LLMs and physicians when predicting 6-month mortality for hospitalized adults who survived critical illness. Design: Embedded mixed methods study with elicitation and comparison of prognostic estimates and reasoning from LLMs and practicing physicians. Setting: The publicly available, deidentified Medical Information Mart for Intensive Care (MIMIC)-IV v2.2 dataset. Participants: We randomly selected 100 hospitalizations of adult survivors of critical illness. Four contemporary LLMs (Open AI GPT-4o, o3- and o4-mini, and DeepSeek-R1) and 7 physicians provided independent prognostic estimates for each case (1,100 total estimates; 400 LLM and 700 physician). Main outcomes and measures: For each case, LLMs and physicians used the hospital discharge summary and demographics to predict 6-month mortality (yes/no) and provide their reasoning (free text). We assessed prognostic performance using accuracy, sensitivity, and specificity, and used inductive, qualitative content analysis to characterize reasonings. Results: Mean physician accuracy for predicting mortality was 70.1% (95% CI 63.7-76.4%), with sensitivity of 59.7% (95% CI 50.6-68.8%) and specificity of 80.6% (95% CI 71.7-88.2%). The top-performing LLM (OpenAI o4-mini) accuracy was 78.0% (95% CI 70.0-86.0%), with sensitivity of 80.0% (95% CI 67.4-90.2%) and specificity of 76.0% (95% CI 63.3-88.0%). The difference between mean physician and top-performing LLM accuracy was not statistically significant (p = 0.5). Qualitative analysis revealed similar patterns in LLM and physician expressed reasoning, except that physicians regularly and explicitly reported uncertainty while LLMs did not. Conclusion and Relevance: In this study, LLMs and physicians achieved comparable, moderate performance in predicting 6-month mortality after critical illness, with similar patterns in expressed reasoning. Our findings suggest LLMs could be used to support prognostication in clinical practice but also raise safety concerns due to the lack of LLM uncertainty expression.

18

Association of Otolithic Integrity With Subjective and Functional Outcomes in Vestibular Rehabilitation: A Pilot Study

Cortes, Y. H.; Ramos Maldonado, D.; Romo, V. S.; Annel, G.-C.; Leyva, I. C.

2026-04-03 rehabilitation medicine and physical therapy 10.64898/2026.04.01.26349994 medRxiv

Top 0.4%

0.9%

Show abstract

Variable recovery in vestibular rehabilitation underscores the need for objective biomarkers to identify patients at risk of poor clinical outcomes. This study aimed to establish proof of concept for a multidimensional prognostic framework using structural cervical vestibular evoked myogenic potential (cVEMP) and functional modified Clinical Test of Sensory Interaction on Balance (mCTSIB) markers to predict therapeutic success. This prospective cohort study was conducted at a tertiary rehabilitation center between June 2023 and May 2025. Participants were adults with peripheral vestibular disorders, including unilateral vestibular dysfunction, Meniere disease, or superior semicircular canal dehiscence. All participants underwent a customized five-session vestibular rehabilitation protocol. Primary outcomes were subjective clinical success, defined as an 18-point reduction in Dizziness Handicap Inventory (DHI) score, and functional success, defined as a 3-point increase in Dynamic Gait Index score. Among 30 participants (mean age 60.8 years; 77% female), the rehabilitation protocol was associated with significant improvements in mean DHI (53.7 to 37.8; P = .003) and Dynamic Gait Index (19.5 to 22.1; P = .003) scores. While 83% of participants showed raw DHI improvement, only 37% achieved the 18-point minimal clinically important difference. Notably, no participants in the bilateral cVEMP absence group achieved subjective success, compared with 52.6% in the bilateral present group (P trend = .08). Multivariable logistic regression identified baseline DHI severity as an independent predictor of success (odds ratio, 1.05; 95% CI, 1.00-1.10; P = .04). Functional gait success was significantly correlated with baseline vestibular and visual preference ratios. These findings suggest that baseline otolithic structural integrity is a primary determinant of subjective recovery. Bilateral structural loss may represent a "structural floor" where meaningful relief is physiologically limited despite functional gains. These results support a precision-based model using structural and sensory biomarkers to tailor rehabilitation

19

Evolving concerns about the COVID-19 pandemic: A content analysis of free-text reports from the UK COVID-19 Public Experiences (COPE) study cohort over a two-year period

Phillips, R.; Wood, F.; Torrens-Burton, A.; Glennan, C.; Sellars, P.; Lowe, S.; Caffoor, A.; Hallingberg, B.; Gillespie, D.; Shepherd, V.; Poortinga, W.; Wahl-Jorgensen, K.; Williams, D.

2026-04-19 public and global health 10.64898/2026.04.16.26351013 medRxiv

Top 0.5%

0.9%

Show abstract

Objectives Concerns about COVID-19 were a key driver of infection-prevention behaviour during the pandemic. The aim of this study was to gain an in-depth longitudinal understanding of the type and frequency of concerns experienced throughout the first two years of the COVID-19 pandemic. Design Content analysis of qualitative descriptions provided in a prospective longitudinal online survey as part of the COVID-19 UK Public Experiences (COPE) Study. Method At baseline (March/April 2020), when the UK entered its first national lockdown, 11,113 adults completed the COPE survey. Follow-up surveys were conducted at 3, 12, 18 and 24 months. Participants were recruited via the HealthWise Wales research registry and social media. Baseline surveys collected demographic and health data, and all waves included an open-ended question about COVID-19 concerns. Content analysis was used to identify the type and frequency of concerns at each time point. Results A total of 41,564 open-text responses were coded into six categories: personal harm (n=16,353), harm to others (n=11,464), social/economic impact (n=6,433), preventing transmission (n=4,843), government/media (n=1,048), and general concerns (n=1,423). The proportion of respondents reporting any concern declined from 75.3% at baseline to 65.8% at 24 months. Over time, concerns about personal harm increased (baseline 41.8% vs. 24-months 52.7%) whereas concerns about harm to others decreased (baseline 48.5% vs. 24-months 28.6%). Concerns about harm were also expressed in relation to clinical vulnerability, lack of trust in government/media, and perceived lack of adherence by others. These were balanced against concerns about wider social and economic impacts of restrictions. Conclusions Public concerns about COVID-19 evolved substantially over the first two years of the pandemic, reflecting changing perceptions of risk and responsibility. Monitoring concerns longitudinally is vital to help guide effective communication and behavioural interventions during future pandemics.

20

Exploring the Impact of a Medical Device Recall on Individuals with Obstructive Sleep Apnea and Healthcare Providers: A Qualitative Study

Pendharkar, S.; Blades, K.; Yazji, B.; Ayas, N.; Owens, R.; Kaminska, M.; Mackenzie, C.; Gershon, A.; Ratycz, D.; Lischenko, V.; Fenton, M. E.; McBrien, K.; Povitz, M.; Kendzerska, T.

2026-03-27 respiratory medicine 10.64898/2026.03.25.26349320 medRxiv

Top 0.5%

0.8%

Show abstract

Purpose: To understand how the Philips PAP device recall affected patient experiences, clinical practice, and health system responses. Methods: From November 2022 to August 2023, we interviewed individuals with OSA, physicians, respiratory therapists and health system leaders. We also received emailed responses from Health Canada. Interviews explored participants' experiences with the recall announcement and communication, their own responses and perceptions of actions taken by others, the overall impact of the recall and suggestions for improving future recall processes. Interviews were analyzed using an inductive thematic approach. Results: We interviewed 47 participants (16 individuals with OSA, 10 physicians, 17 public or private respiratory therapists, five health system leaders). Themes were organized into four domains: recall communication, execution, participant experiences, and the policy and regulatory context. Participants were confused due to inadequate information from Philips throughout the process. The burden of notifying patients and tracing devices mostly fell to healthcare providers and vendors, while replacement efforts were disorganized and frustrating. Individuals with OSA experienced emotional distress over therapy decisions and difficulties navigating the recall. Healthcare providers described moral distress from being unable to support patients adequately, and vendors faced additional logistical and financial strain. While regulatory authorities reported that Philips followed standard procedures, participants expressed a loss of trust in both the manufacturer and oversight systems. Conclusions: Interviews revealed that poor communication and execution of the Philips recall caused confusion, frustration and significant emotional and financial burden. Collaborative, context-specific strategies are required to improve future recalls.